An Augmentation Hybrid System for Document Classification and Rating

نویسندگان

  • Richard Dazeley
  • Byeong Ho Kang
چکیده

This paper introduces an augmentation hybrid system, referred to as Rated MCRDR. It uses Multiple Classification Ripple Down Rules (MCRDR), a simple and effective knowledge acquisition technique, combined with a neural network. Introduction As we move from the Information Age to the Age of Information Overload, Information Filtering (IF) has gained significant attention in the research community. This paper briefly introduces a new method based on a variant to the Multiple Classification Ripple Down Rules (MCRDR) methodology, called Rated MCRDR (RM) [1]. Rated MCRDR is an augmentation hybrid intelligent system developed to provide both classifications and a relevance ranking of cases and can be applied in many domains [1]. One of the key areas that the algorithm was designed for was information filtering and in fact draws heavily on ideas found in the information filtering research. The main idea behind the system is to significantly reduce the feature space, so that it is of a size that a neural network is capable of handling, in such a way that we don’t effectively loose any relevant information. Rated MCRDR (RM) To achieve this , RM adopted the basic premise that while the majority of features may be statistically relevant [2] it is safe to assume that an individual user is not interested in all the possible features. Therefore, RM attempts to identify keywords, groups of words, phrases or even compressed features , outputted from some other feature reduction method, by using simple user interrogation, by using the Multiple Classification Ripple Down Rules (MCRDR) [3]. This incremental Knowledge Acquisition (KA) methodology allows a user to perform both the KA process and the maintenance of a Knowledge Based System (KBS) over time [3]. The basic concept 1 Collaborative research project between both institutions a) RM Performance on First Document Set -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Order of Documents by RM U se r R at in g b) RM Performance on Fifth Document Set -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Order of Documents by RM U se r R at in g Fig. 1. Ability of RM to order cases according to the simulated-user’s preference. a) Shows RM’s performance prior to any training. b) Shows RM’s performance after 5 document sets. behind MCRDR is to use the user’s knowledge within the context it is provided [1, 3] to produce multiple classifications for an individual document. Therefore, if the expert disagrees with one or more of the conclusions found by the system, knowledge can be easily added to improve future results. It then learns further information, through observing user behaviour, about the relationships between groups of identified features to capture a deeper sociological meaning behind the selected features as well as to associate a set of relevance rankings. When a new feature or set of features are identified by the user, the specifically designed neural network steps to a rating that accurately identifies its relevance to the user immediately. After the initial learning step, any further documents receiving the same classification allow the network to learn more intricate non-linear relationships. Thus, RM has the ability to learn both classifications for documents if required, as well as being able to learn both linear and non-linear ratings effectively. The remainder of this paper will discuss RM in detail. Results and DiscussionThe system has undergone preliminary testing with a simu lated expert using arandomly generated data set. Figure 1, illustrates how RM was able to place thedocuments with a higher relevance to the user first after only seeing 5 groups of 50documents. These tests were done primarily to show that the system was able to learnquickly and to be used for parameter tuning purposes. Clearly a more rigorous testingregime needs to be used in order to fully justify the algorithm’s ability to learn withinthe information domain. References1. R. Dazeley and B. H. Kang. Rated MCRDR: Finding non-Linear Relationships BetweenClassifications in MCRDR. in 3rd International Conference on Hybrid Intelligent Systems.2003. Melbourne, Australia: IOS Press2. T. Joachims. Text Categorization with Support Vector Machines: Learning with ManyRelevant Features. in European Conference on Machine Learning (ECML). 1998: Springer3. B. H. Kang, Validating Knowledge Acquisition: Multiple Classification Ripple DownRules. 1996, University of New South Wales: Sydney.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection

A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Document Analysis And Classification Based On Passing Window

In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...

متن کامل

An Ensemble Click Model for Web Document Ranking

Annually, web search engine providers spend more and more money on documents ranking in search engines result pages (SERP). Click models provide advantageous information for ranking documents in SERPs through modeling interactions among users and search engines. Here, three modules are employed to create a hybrid click model; the first module is a PGM-based click model, the second module in a d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004